Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 54
Filter
1.
Sci Rep ; 14(1): 9516, 2024 04 25.
Article in English | MEDLINE | ID: mdl-38664448

ABSTRACT

Recent technologies such as spatial transcriptomics, enable the measurement of gene expressions at the single-cell level along with the spatial locations of these cells in the tissue. Spatial clustering of the cells provides valuable insights into the understanding of the functional organization of the tissue. However, most such clustering methods involve some dimension reduction that leads to a loss of the inherent dependency structure among genes at any spatial location in the tissue. This destroys valuable insights of gene co-expression patterns apart from possibly impacting spatial clustering performance. In spatial transcriptomics, the matrix-variate gene expression data, along with spatial coordinates of the single cells, provides information on both gene expression dependencies and cell spatial dependencies through its row and column covariances. In this work, we propose a joint Bayesian approach to simultaneously estimate these gene and spatial cell correlations. These estimates provide data summaries for downstream analyses. We illustrate our method with simulations and analysis of several real spatial transcriptomic datasets. Our work elucidates gene co-expression networks as well as clear spatial clustering patterns of the cells. Furthermore, our analysis reveals that downstream spatial-differential analysis may aid in the discovery of unknown cell types from known marker genes.


Subject(s)
Bayes Theorem , Gene Expression Profiling , Transcriptome , Gene Expression Profiling/methods , Cluster Analysis , Humans , Single-Cell Analysis/methods , Gene Regulatory Networks , Algorithms , Computer Simulation
2.
bioRxiv ; 2024 Mar 20.
Article in English | MEDLINE | ID: mdl-38562905

ABSTRACT

Epidemiological studies have shown that circadian rhythm disruption (CRD) caused by shift work or frequent jet lag is associated with the risk of breast cancer development. However, the role of CRD in mammary gland morphology and aggressive mammary tumorigenesis and the molecular mechanisms underlying CRD and cancer risk remain unknown. We found that chronic CRD disrupted mouse mammary gland morphology and increased tumor burden and lung metastasis in a genetically engineered mouse model of aggressive breast cancer and induced an immunosuppressive tumor microenvironment by enhancing leukocyte immunoglobulin-like receptor 4a (LILRB4a or LILRB4) expression. Moreover, CRD increased the M2 macrophage (anti-inflammatory) and regulatory T-cell populations but decreased the M1 macrophage (proinflammatory) populations. These findings identify and implicate LILRB4a as a link between CRD and aggressive mammary tumorigenesis. Teaser: Circadian rhythm disruption enhances aggressive mammary tumorigenesis by elevating LILRB4a expression.

3.
Sci Rep ; 14(1): 8595, 2024 04 13.
Article in English | MEDLINE | ID: mdl-38615084

ABSTRACT

The COVID-19 pandemic has profoundly reshaped human life. The development of COVID-19 vaccines has offered a semblance of normalcy. However, obstacles to vaccination have led to substantial loss of life and economic burdens. In this study, we analyze data from a prominent health insurance provider in the United States to uncover the underlying reasons behind the inability, refusal, or hesitancy to receive vaccinations. Our research proposes a methodology for pinpointing affected population groups and suggests strategies to mitigate vaccination barriers and hesitations. Furthermore, we estimate potential cost savings resulting from the implementation of these strategies. To achieve our objectives, we employed Bayesian data mining methods to streamline data dimensions and identify significant variables (features) influencing vaccination decisions. Comparative analysis reveals that the Bayesian method outperforms cutting-edge alternatives, demonstrating superior performance.


Subject(s)
COVID-19 , Humans , Bayes Theorem , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19 Vaccines , Pandemics , Data Mining , Vaccination
4.
Biometrics ; 80(1)2024 Jan 29.
Article in English | MEDLINE | ID: mdl-38364805

ABSTRACT

Survival models are used to analyze time-to-event data in a variety of disciplines. Proportional hazard models provide interpretable parameter estimates, but proportional hazard assumptions are not always appropriate. Non-parametric models are more flexible but often lack a clear inferential framework. We propose a Bayesian treed hazards partition model that is both flexible and inferential. Inference is obtained through the posterior tree structure and flexibility is preserved by modeling the log-hazard function in each partition using a latent Gaussian process. An efficient reversible jump Markov chain Monte Carlo algorithm is accomplished by marginalizing the parameters in each partition element via a Laplace approximation. Consistency properties for the estimator are established. The method can be used to help determine subgroups as well as prognostic and/or predictive biomarkers in time-to-event data. The method is compared with some existing methods on simulated data and a liver cirrhosis dataset.


Subject(s)
Algorithms , Proportional Hazards Models , Bayes Theorem , Markov Chains , Monte Carlo Method
5.
Proteins ; 92(5): 637-648, 2024 May.
Article in English | MEDLINE | ID: mdl-38146101

ABSTRACT

Bacteriophages are the natural predators of bacteria and are available abundantly everywhere in nature. Lytic phages can specifically infect their bacterial host (through attachment to the receptor) and use their host replication machinery to replicate rapidly, a feature that enables them to kill a disease-causing bacteria. Hence, phage attachment to the host bacteria is the first important step of the infection process. It is reported in this study that the receptor could be an LPS which is responsible for the attachment of the Sfk20 phage to its host (Shigella flexneri 2a). Phage Sfk20 bacteriolytic activity was examined for preliminary optimization of phage titer. The phage Sfk20 viability at different saline conditions was conducted. The LC-MS/MS technique used here for detecting and identifying 40 Sfk20 phage proteins helped us to get an initial understanding of the structural landscape of phage Sfk20. From the identified proteins, six structurally significant proteins were selected for structure prediction using two neural network systems: AlphaFold2 and ESMFold, and one homology modeling software: Phyre2. Later the performance of these modeling systems was compared using various metrics. We conclude from the available and generated information that AlphaFold2 and Phyre2 perform better than ESMFold for predicting Sfk20 phage protein structures.


Subject(s)
Bacteriophages , Shigella , Bacteriophages/genetics , Proteomics , Chromatography, Liquid , Tandem Mass Spectrometry , Bacteria
6.
BMC Microbiol ; 23(1): 324, 2023 11 03.
Article in English | MEDLINE | ID: mdl-37924001

ABSTRACT

BACKGROUND: Salmonella enterica serotype Typhi is one of the major pathogens causing typhoid fever and a public health burden worldwide. Recently, the increasing number of multidrug-resistant strains of Salmonella spp. has made this utmost necessary to consider bacteriophages as a potential alternative to antibiotics for S. Typhi infection treatment. Salmonella phage STWB21, isolated from environmental water, has earlier been reported to be effective as a safe biocontrol agent by our group. In this study, we evaluated the efficacy of phage STWB21 in reducing the burden of salmonellosis in a mammalian host by inhibiting Salmonella Typhi invasion into the liver and spleen tissue. RESULTS: Phage treatment significantly improved the survival percentage of infected mice. This study also demonstrated that oral administration of phage treatment could be beneficial in both preventive and therapeutic treatment of salmonellosis caused by S. Typhi. Altogether the result showed that the phage treatment could control tissue inflammation in mice before and after Salmonella infection. CONCLUSIONS: To the best of our knowledge, this is the first report of phage therapy in a mouse model against a clinically isolated Salmonella Typhi strain that includes direct visualization of histopathology and ultrathin section microscopy images from the liver and spleen sections.


Subject(s)
Bacteriophages , Phage Therapy , Salmonella Infections , Salmonella Phages , Typhoid Fever , Animals , Mice , Salmonella typhi , Bacterial Load , Typhoid Fever/therapy , Typhoid Fever/microbiology , Salmonella Infections/therapy , Mammals
7.
Genet Epidemiol ; 47(1): 95-104, 2023 02.
Article in English | MEDLINE | ID: mdl-36378773

ABSTRACT

The clustering of proteins is of interest in cancer cell biology. This article proposes a hierarchical Bayesian model for protein (variable) clustering hinging on correlation structure. Starting from a multivariate normal likelihood, we enforce the clustering through prior modeling using angle-based unconstrained reparameterization of correlations and assume a truncated Poisson distribution (to penalize a large number of clusters) as prior on the number of clusters. The posterior distributions of the parameters are not in explicit form and we use a reversible jump Markov chain Monte Carlo based technique is used to simulate the parameters from the posteriors. The end products of the proposed method are estimated cluster configuration of the proteins (variables) along with the number of clusters. The Bayesian method is flexible enough to cluster the proteins as well as estimate the number of clusters. The performance of the proposed method has been substantiated with extensive simulation studies and one protein expression data with a hereditary disposition in breast cancer where the proteins are coming from different pathways.


Subject(s)
Breast Neoplasms , Humans , Female , Bayes Theorem , Breast Neoplasms/genetics , Models, Genetic , Cluster Analysis , Markov Chains , Monte Carlo Method
8.
Front Microbiol ; 13: 980025, 2022.
Article in English | MEDLINE | ID: mdl-36071966

ABSTRACT

Salmonella is one of the common causal agents of bacterial gastroenteritis-related morbidity and mortality among children below 5 years and the elderly populations. Salmonellosis in humans is caused mainly by consuming contaminated food originating from animals. The genus Salmonella has several serovars, and many of them are recently reported to be resistant to multiple drugs. Therefore, isolation of lytic Salmonella bacteriophages in search of bactericidal activity has received importance. In this study, a Salmonella phage STWB21 was isolated from a lake water sample and found to be a novel lytic phage with promising potential against the host bacteria Salmonella typhi. However, some polyvalence was observed in their broad host range. In addition to S. typhi, the phage STWB21 was able to infect S. paratyphi, S. typhimurium, S. enteritidis, and a few other bacterial species such as Sh. flexneri 2a, Sh. flexneri 3a, and ETEC. The newly isolated phage STWB21 belongs to the Siphoviridae family with an icosahedral head and a long flexible non-contractile tail. Phage STWB21 is relatively stable under a wide range of pH (4-11) and temperatures (4°C-50°C) for different Salmonella serovars. The latent period and burst size of phage STWB21 against S. typhi were 25 min and 161 plaque-forming units per cell. Since Salmonella is a foodborne pathogen, the phage STWB21 was applied to treat a 24 h biofilm formed in onion and milk under laboratory conditions. A significant reduction was observed in the bacterial population of S. typhi biofilm in both cases. Phage STWB21 contained a dsDNA of 112,834 bp in length, and the GC content was 40.37%. Also, genomic analysis confirmed the presence of lytic genes and the absence of any lysogeny or toxin genes. Overall, the present study reveals phage STWB21 has a promising ability to be used as a biocontrol agent of Salmonella spp. and proposes its application in food industries.

9.
J Am Stat Assoc ; 116(535): 1075-1087, 2021.
Article in English | MEDLINE | ID: mdl-34898760

ABSTRACT

Estimating the marginal and joint densities of the long-term average intakes of different dietary components is an important problem in nutritional epidemiology. Since these variables cannot be directly measured, data are usually collected in the form of 24-hour recalls of the intakes, which show marked patterns of conditional heteroscedasticity. Significantly compounding the challenges, the recalls for episodically consumed dietary components also include exact zeros. The problem of estimating the density of the latent long-time intakes from their observed measurement error contaminated proxies is then a problem of deconvolution of densities with zero-inflated data. We propose a Bayesian semiparametric solution to the problem, building on a novel hierarchical latent variable framework that translates the problem to one involving continuous surrogates only. Crucial to accommodating important aspects of the problem, we then design a copula based approach to model the involved joint distributions, adopting different modeling strategies for the marginals of the different dietary components. We design efficient Markov chain Monte Carlo algorithms for posterior inference and illustrate the efficacy of the proposed method through simulation experiments. Applied to our motivating nutritional epidemiology problems, compared to other approaches, our method provides more realistic estimates of the consumption patterns of episodically consumed dietary components.

10.
Sci Rep ; 11(1): 19313, 2021 09 29.
Article in English | MEDLINE | ID: mdl-34588569

ABSTRACT

Shigellosis, caused by Shigella bacterial spp., is one of the leading causes of diarrheal morbidity and mortality. An increasing prevalence of multidrug-resistant Shigella species has revived the importance of bacteriophages as an alternative therapy to antibiotics. In this study, a novel bacteriophage, Sfk20, has been isolated from water bodies of a diarrheal outbreak area in Kolkata (India) with lytic activity against many Shigella spp. Phage Sfk20 showed a latent period of 20 min and a large burst size of 123 pfu per infected cell in a one-step growth analysis. Phage-host interaction and lytic activity confirmed by phage attachment, intracellular phage development, and bacterial cell burst using ultrathin sectioning and TEM analysis. The genomic analysis revealed that the double-stranded DNA genome of Sfk20 contains 164,878 bp with 35.62% G + C content and 241 ORFs. Results suggested phage Sfk20 to include as a member of the T4 myoviridae bacteriophage group. Phage Sfk20 has shown anti-biofilm potential against Shigella species. The results of this study imply that Sfk20 has good possibilities to be used as a biocontrol agent.


Subject(s)
Bacteriophage T4/isolation & purification , Dysentery, Bacillary/prevention & control , Phage Therapy/methods , Shigella/virology , Bacteriophage T4/genetics , Bacteriophage T4/ultrastructure , Dysentery, Bacillary/microbiology , Humans , India , Shigella/pathogenicity , Water Microbiology
11.
Bernoulli (Andover) ; 27(1): 637-672, 2021 Feb.
Article in English | MEDLINE | ID: mdl-34305432

ABSTRACT

Gaussian graphical models are a popular tool to learn the dependence structure in the form of a graph among variables of interest. Bayesian methods have gained in popularity in the last two decades due to their ability to simultaneously learn the covariance and the graph. There is a wide variety of model-based methods to learn the underlying graph assuming various forms of the graphical structure. Although for scalability of the Markov chain Monte Carlo algorithms, decomposability is commonly imposed on the graph space, its possible implication on the posterior distribution of the graph is not clear. An open problem in Bayesian decomposable structure learning is whether the posterior distribution is able to select a meaningful decomposable graph that is "close" to the true non-decomposable graph, when the dimension of the variables increases with the sample size. In this article, we explore specific conditions on the true precision matrix and the graph, which results in an affirmative answer to this question with a commonly used hyper-inverse Wishart prior on the covariance matrix and a suitable complexity prior on the graph space. In absence of structural sparsity assumptions, our strong selection consistency holds in a high-dimensional setting where p = O(nα ) for α < 1/3. We show when the true graph is non-decomposable, the posterior distribution concentrates on a set of graphs that are minimal triangulations of the true graph.

12.
Adv Exp Med Biol ; 1332: 211-227, 2021.
Article in English | MEDLINE | ID: mdl-34251646

ABSTRACT

Measuring usual dietary intake in freely living humans is difficult to accomplish. As a part of our recent study, a food frequency questionnaire was completed by healthy adult men and women at days 0 and 90 of the study. Data from the food questionnaire were analyzed with a nutrient analysis program ( www.Harvardsffq.date ). Healthy men and women consumed protein as 19-20% and 17-19% of their total energy intakes, respectively, with animal protein representing about 75 and 70% of their total protein intakes, respectively. The intake of each nutritionally essential amino acid (EAA) by the persons exceeded that recommended for healthy adults with a minimal physical activity. In all individuals, the dietary intake of leucine was the highest, followed by lysine, valine, and isoleucine in descending order, and the ingestion of amino acids that are synthesizable de novo in animal cells (AASAs) was about 20% greater than that of total EAAs. The intake of each AASA met those recommended for healthy adults with a minimal physical activity. Intakes of some AASAs (alanine, arginine, aspartate, glutamate, and glycine) from a typical diet providing 90-110 g food protein/day does not meet the requirements of adults with an intensive physical activity. Within the male or female group, there were not significant differences in the dietary intakes of all amino acids between days 0 and 90 of the study, and this was also true for nearly all other essential nutrients. Our findings will help to improve amino acid nutrition and health in both the general population and exercising individuals.


Subject(s)
Amino Acids , Diet , Adult , Eating , Energy Intake , Female , Humans , Male , Nutrients
13.
Chemometr Intell Lab Syst ; 2122021 May 15.
Article in English | MEDLINE | ID: mdl-35068632

ABSTRACT

BACKGROUND: The endogenous circadian clock, which controls daily rhythms in the expression of at least half of the mammalian genome, has a major influence on cell physiology. Consequently, disruption of the circadian system is associated with wide range of diseases including cancer. While several circadian clock genes have been associated with cancer progression, little is known about the survival when two or more platforms are considered together. Our goal was to determine if survival outcomes are associated with circadian clock function. To accomplish this goal, we developed a Bayesian hierarchical survival model coupled with the global local shrinkage prior and applied this model to available RNASeq and Copy Number Variation data to select significant circadian genes associates with cancer progression. RESULTS: Using a Bayesian shrinkage approach with the Bayesian accelerated failure time (AFT) model we showed the circadian clock associated gene DEC1 is positively correlated to survival outcome in breast cancer patients. The R package circgene implementing the methodology is available at https://github.com/MAITYA02/circgene. CONCLUSIONS: The proposed Bayesian hierarchical model is the first shrinkage prior based model in its kind which integrates two omics platforms to identify the significant circadian gene for cancer survival.

14.
Biometrika ; 107(1): 205-221, 2020 Mar.
Article in English | MEDLINE | ID: mdl-33100350

ABSTRACT

We develop a Bayesian methodology aimed at simultaneously estimating low-rank and row-sparse matrices in a high-dimensional multiple-response linear regression model. We consider a carefully devised shrinkage prior on the matrix of regression coefficients which obviates the need to specify a prior on the rank, and shrinks the regression matrix towards low-rank and row-sparse structures. We provide theoretical support to the proposed methodology by proving minimax optimality of the posterior mean under the prediction risk in ultra-high dimensional settings where the number of predictors can grow sub-exponentially relative to the sample size. A one-step post-processing scheme induced by group lasso penalties on the rows of the estimated coefficient matrix is proposed for variable selection, with default choices of tuning parameters. We additionally provide an estimate of the rank using a novel optimization function achieving dimension reduction in the covariate space. We exhibit the performance of the proposed methodology in an extensive simulation study and a real data example.

15.
PLoS One ; 15(10): e0238996, 2020.
Article in English | MEDLINE | ID: mdl-33095785

ABSTRACT

Recent developments in high-throughput methods have resulted in the collection of high-dimensional data types from multiple sources and technologies that measure distinct yet complementary information. Integrated clustering of such multiple data types or multi-view clustering is critical for revealing pathological insights. However, multi-view clustering is challenging due to the complex dependence structure between multiple data types, including directional dependency. Specifically, genomics data types have pre-specified directional dependencies known as the central dogma that describes the process of information flow from DNA to messenger RNA (mRNA) and then from mRNA to protein. Most of the existing multi-view clustering approaches assume an independent structure or pair-wise (non-directional) dependence between data types, thereby ignoring their directional relationship. Motivated by this, we propose a biology-inspired Bayesian integrated multi-view clustering model that uses an asymmetric copula to accommodate the directional dependencies between the data types. Via extensive simulation experiments, we demonstrate the negative impact of ignoring directional dependency on clustering performance. We also present an application of our model to a real-world dataset of breast cancer tumor samples collected from The Cancer Genome Altas program and provide comparative results.


Subject(s)
Genomics/methods , Models, Statistical , Bayes Theorem , Breast Neoplasms/genetics , Cluster Analysis , Computer Simulation , Data Interpretation, Statistical , Databases, Genetic/statistics & numerical data , Female , Genomics/statistics & numerical data , Humans , Markov Chains , Normal Distribution
16.
PLoS One ; 15(7): e0236860, 2020.
Article in English | MEDLINE | ID: mdl-32726361

ABSTRACT

Currently, novel coronavirus disease 2019 (COVID-19) is a big threat to global health. The rapid spread of the virus has created pandemic, and countries all over the world are struggling with a surge in COVID-19 infected cases. There are no drugs or other therapeutics approved by the US Food and Drug Administration to prevent or treat COVID-19: information on the disease is very limited and scattered even if it exists. This motivates the use of data integration, combining data from diverse sources and eliciting useful information with a unified view of them. In this paper, we propose a Bayesian hierarchical model that integrates global data for real-time prediction of infection trajectory for multiple countries. Because the proposed model takes advantage of borrowing information across multiple countries, it outperforms an existing individual country-based model. As fully Bayesian way has been adopted, the model provides a powerful predictive tool endowed with uncertainty quantification. Additionally, a joint variable selection technique has been integrated into the proposed modeling scheme, which aimed to identify possible country-level risk factors for severe disease due to COVID-19.


Subject(s)
Betacoronavirus , Coronavirus Infections/epidemiology , Coronavirus Infections/transmission , Global Health/trends , Pneumonia, Viral/epidemiology , Pneumonia, Viral/transmission , Bayes Theorem , COVID-19 , Coronavirus Infections/virology , Humans , Models, Theoretical , Pandemics , Pneumonia, Viral/virology , Prognosis , Risk Factors , SARS-CoV-2 , Travel , Uncertainty
17.
Bioinformatics ; 36(13): 3951-3958, 2020 07 01.
Article in English | MEDLINE | ID: mdl-32369552

ABSTRACT

MOTIVATION: It is well known that the integration among different data-sources is reliable because of its potential of unveiling new functionalities of the genomic expressions, which might be dormant in a single-source analysis. Moreover, different studies have justified the more powerful analyses of multi-platform data. Toward this, in this study, we consider the circadian genes' omics profile, such as copy number changes and RNA-sequence data along with their survival response. We develop a Bayesian structural equation modeling coupled with linear regressions and log normal accelerated failure-time regression to integrate the information between these two platforms to predict the survival of the subjects. We place conjugate priors on the regression parameters and derive the Gibbs sampler using the conditional distributions of them. RESULTS: Our extensive simulation study shows that the integrative model provides a better fit to the data than its closest competitor. The analyses of glioblastoma cancer data and the breast cancer data from TCGA, the largest genomics and transcriptomics database, support our findings. AVAILABILITY AND IMPLEMENTATION: The developed method is wrapped in R package available at https://github.com/MAITYA02/semmcmc. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome , Genomics , Bayes Theorem , Computational Biology , Humans , Latent Class Analysis , Software
18.
Biometrics ; 76(1): 316-325, 2020 03.
Article in English | MEDLINE | ID: mdl-31393003

ABSTRACT

Accurate prognostic prediction using molecular information is a challenging area of research, which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes, like the survival time of patients. There are considerable statistical and computational challenges due to the large dimension of the problems. Furthermore, data are available for different tumor types; hence data integration for various tumors is desirable. Having censored survival outcomes escalates one more level of complexity in the inferential procedure. We develop Bayesian hierarchical survival models, which accommodate all the challenges mentioned here. We use the hierarchical Bayesian accelerated failure time model for survival regression. Furthermore, we assume sparse horseshoe prior distribution for the regression coefficients to identify the major proteomic drivers. We borrow strength across tumor groups by introducing a correlation structure among the prior distributions. The proposed methods have been used to analyze data from the recently curated "The Cancer Proteome Atlas" (TCPA), which contains reverse-phase protein arrays-based high-quality protein expression data as well as detailed clinical annotation, including survival times. Our simulation and the TCPA data analysis illustrate the efficacy of the proposed integrative model, which links different tumors with the correlated prior structures.


Subject(s)
Biometry/methods , Neoplasms/metabolism , Neoplasms/mortality , Proteome/metabolism , Proteomics/statistics & numerical data , Bayes Theorem , Computer Simulation , Data Interpretation, Statistical , Humans , Kidney Neoplasms/metabolism , Kidney Neoplasms/mortality , Markov Chains , Models, Statistical , Monte Carlo Method , Prognosis , Protein Array Analysis/statistics & numerical data , Survival Analysis
19.
J Mach Learn Res ; 21(79): 1-47, 2020.
Article in English | MEDLINE | ID: mdl-34305477

ABSTRACT

Graphical models are ubiquitous tools to describe the interdependence between variables measured simultaneously such as large-scale gene or protein expression data. Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices and they are generated under a multivariate normal joint distribution. However, they suffer from several shortcomings since they are based on Gaussian distribution assumptions. In this article, we propose a Bayesian quantile based approach for sparse estimation of graphs. We demonstrate that the resulting graph estimation is robust to outliers and applicable under general distributional assumptions. Furthermore, we develop efficient variational Bayes approximations to scale the methods for large data sets. Our methods are applied to a novel cancer proteomics data dataset where-in multiple proteomic antibodies are simultaneously assessed on tumor samples using reverse-phase protein arrays (RPPA) technology.

20.
Cancer Inform ; 18: 1176935119871933, 2019.
Article in English | MEDLINE | ID: mdl-31488946

ABSTRACT

Long non-coding RNAs (lncRNAs) are a large and diverse class of transcribed RNAs, which have been shown to play a significant role in developing cancer. In this study, we apply integrative modeling framework to integrate the DNA copy number variation (CNV), lncRNA expression, and downstream target protein expression to predict patient survival in breast cancer. We develop a 3-stage model combining a mechanical model (lncRNA regressed on CNV and target proteins regressed on lncRNA) and a clinical model (survival regressed on estimated effects from the mechanical models). Using lncRNAs (such as HOTAIR and MALAT1) along with their CNV, target protein expressions, and survival outcomes from The Cancer Genome Atlas (TCGA) database, we show that predicted mean square error and integrated Brier score (IBS) are both lower for the proposed 3-step integrated model than that of 2-step model. Therefore, the integrative model has better predictive ability than the 2-step model not considering target protein information.

SELECTION OF CITATIONS
SEARCH DETAIL
...